AITopics | global optima

Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

Neural Information Processing SystemsJun-16-2026, 16:28:18 GMT

The empirical emergence of neural collapse--a surprising symmetry in the feature representations of the training data in the penultimate layer of deep neural networks--has spurred a line of theoretical research aimed at its understanding. However, existing work focuses on data-agnostic models or, when data structure is taken into account, it remains limited to multi-layer perceptrons. Our paper fills both these gaps by analyzing modern architectures in a data-aware regime: we prove that global optima of deep regularized transformers and residual networks (ResNets) with LayerNorm trained with cross entropy or mean squared error loss are approximately collapsed, and the approximation gets tighter as the depth grows. More generally, we formally reduce any end-to-end large-depth ResNet or transformer training into an equivalent unconstrained features model, thus justifying its wide use in the literature even beyond data-agnostic settings. Our theoretical results are supported by experiments on computer vision and language datasets showing that, as the depth grows, neural collapse indeed becomes more prominent.

artificial intelligence, machine learning, neural collapse, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Implicit Regularization in Matrix Factorization

Suriya Gunasekar, Blake E. Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nati Srebro

Neural Information Processing SystemsApr-23-2026, 05:37:47 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, gradient descent, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

The non-convex Burer-Monteiro approach works on smooth semidefinite programs

Neural Information Processing SystemsMar-23-2026, 07:16:17 GMT

Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. Although some theory supports this empirical success, a complete explanation of it remains an open question. In this paper, we consider a class of SDPs which includes applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations.

artificial intelligence, machine learning, sdp, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mixed Linear Regression with Multiple Components

Neural Information Processing SystemsMar-17-2026, 09:28:59 GMT

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

573f7f25b7b1eb79a4ec6ba896debefd-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 06:12:00 GMT

bethe free energy, global optima, reviewer, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

A.1BayesianOptimisation

Neural Information Processing SystemsFeb-9-2026, 12:57:31 GMT

Gaussian processes for machine learning, volume 2. MIT pressCambridge,MA,2006.

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

45f7927942098d14e473fc5d000031e2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:15:22 GMT

algorithm, application, regularizer, (16 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > Finland > Pirkanmaa > Tampere (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Add feedback

Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing SystemsDec-25-2025, 18:28:12 GMT

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. Beyond policy evaluation, we establish the global convergence of neural (soft) Q-learning, which is further connected to that of policy gradient algorithms.

global convergence, name change, neural temporal-difference learning converge, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An improved clustering-based multi-swarm PSO using local diversification and topology information

Matanga, Yves, Sun, Yanxia, Wang, Zenghui

arXiv.org Artificial IntelligenceNov-25-2025

Multi-swarm particle optimisation algorithms are gaining popularity due to their ability to locate multiple optimum points concurrently. In this family of algorithms, clustering-based multi-swarm algorithms are among the most effective techniques that join the closest particles together to form independent niche swarms that exploit potential promising regions. However, most clustering-based multi-swarms are Euclidean distance-based and only inquire about the potential of one peak within a cluster and thus can lose multiple peaks due to poor resolution. In a bid to improve the peak detection ratio, the current study proposes two enhancements. First, a preliminary local search across initial particles is proposed to ensure that each local region is sufficiently scouted prior to particle collaboration. Secondly, an investigative clustering approach that performs concavity analysis is proposed to evaluate the potential for several sub-niches within a single cluster. An improved clustering-based multi-swarm PSO (TImPSO) has resulted from these enhancements and has been tested against three competing algorithms in the same family using the IEEE CEC2013 niching datasets, resulting in an improved peak ratio for almost all the test functions.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.17571

Country: Africa > South Africa (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

Mixed Linear Regression with Multiple Components

Neural Information Processing SystemsNov-21-2025, 14:57:38 GMT

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.

linear regression, multiple component, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

global optima

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

Implicit Regularization in Matrix Factorization

The non-convex Burer-Monteiro approach works on smooth semidefinite programs

Mixed Linear Regression with Multiple Components

573f7f25b7b1eb79a4ec6ba896debefd-AuthorFeedback.pdf

A.1BayesianOptimisation

45f7927942098d14e473fc5d000031e2-Paper-Conference.pdf

Neural Temporal-Difference Learning Converges to Global Optima

An improved clustering-based multi-swarm PSO using local diversification and topology information

Mixed Linear Regression with Multiple Components